Skip to content

[WIP] feature: shell integration 💻#508

Draft
tnaum-ms wants to merge 126 commits intonextfrom
feature/shell-integration
Draft

[WIP] feature: shell integration 💻#508
tnaum-ms wants to merge 126 commits intonextfrom
feature/shell-integration

Conversation

@tnaum-ms
Copy link
Collaborator

@tnaum-ms tnaum-ms commented Feb 17, 2026

Shell Integration — DocumentDB Query Language & Autocomplete

Umbrella PR for the shell integration feature: a custom documentdb-query Monaco language with intelligent autocomplete, hover docs, and validation across all query editor surfaces (filter, project, sort, aggregation, shell).

Work is organized as incremental steps, each delivered via a dedicated sub-PR merged into feature/shell-integration.


Progress

  • Step 1 — Schema Tool Decision — Evaluated schema analysis approaches, decided to enhance SchemaAnalyzer (JSON Schema output, incremental merge, 24 BSON types)
  • Step 2 — SchemaAnalyzer Refactoring · refactor: SchemaAnalyzer class + enhanced FieldEntry + new schema transformers #506 — Extracted @vscode-documentdb/schema-analyzer package, enriched FieldEntry with BSON types, added schema transformers, introduced monorepo structure
  • Step 3 — documentdb-constants Package · feat: add documentdb-constants package — operator metadata for autocomplete #513 — 308 operator entries (DocumentDB API query operators, update operators, stages, accumulators, BSON constructors, system variables) as static metadata for autocomplete
  • Step 3.5 — Monaco Language Architecture — Selected documentdb-query custom language with JS Monarch tokenizer (no TS worker), validated via POC across 8 test criteria
  • Step 4 — Filter CompletionItemProvider · feat: documentdb-query language — CompletionItemProvider, HoverProvider, acorn validation #518documentdb-query language registration, per-editor model URIs, completion data store, CompletionItemProvider (filter/project/sort), HoverProvider, acorn validation, $-prefix fix, query parser replacement (shell-bson-parser), type-aware operator sorting, legacy JSON Schema pipeline removal
  • Step 4.5 — Context-Sensitive Completions · feat: context-sensitive completions — cursor-aware filtering & type suggestions #530 ✅ — Cursor-position-aware filtering (key/value/operator/array-element), type-aware value suggestions (bool → true/false, number → range query, etc.), snippet escape fix, completion item styling, refactored into completions/ folder
  • Step 4.6 — Collection View & Autocompletion UX Improvements · feat: collection view & autocompletion UX improvements (Step 4.6) #532 — Clickable doc links, field completion persistence, $not fix, project/sort value completions, auto-trigger characters, smart-trigger, field hover provider, quoted key hover, hover link handler, category coverage tests
  • Step 5 — Legacy Scrapbook Removal
  • Step 6 — Scrapbook Rebuild (New Shell)
  • Step 7 — Shell CompletionItemProvider
  • Step 8 — Aggregation CompletionItemProvider

Key Architecture Decisions

Decision Outcome
Language strategy documentdb-query custom language — JS Monarch tokenizer, no TS worker (~400-600 KB saved)
Completion providers Single CompletionItemProvider + URI routing (documentdb://{editorType}/{sessionId})
Completion data documentdb-constants bundled at build time; field data pushed via tRPC subscription
Validation acorn.parseExpressionAt() for syntax errors; acorn-walk + documentdb-constants for identifier validation
Document editors Stay on language="json" with JSON Schema validation
Shell/Scrapbook (future) language="javascript" with full TS service + .d.ts via addExtraLib()

Sub-PRs

PR Step Title Status
#506 2 refactor: SchemaAnalyzer class + enhanced FieldEntry + new schema transformers ✅ Merged
#513 3 feat: add documentdb-constants package — operator metadata for autocomplete ✅ Merged
#518 4 feat: documentdb-query language — CompletionItemProvider, HoverProvider, acorn validation ✅ Merged
#530 4.5 feat: context-sensitive completions — cursor-aware filtering & type suggestions ✅ Merged
#532 4.6 feat: collection view & autocompletion UX improvements ✅ Merged

tnaum-ms and others added 27 commits February 16, 2026 20:16
… stats bugs

Group A of SchemaAnalyzer refactor:
- Fix A1: array element stats overwrite bug (isNewTypeEntry)
- Fix A2: probability >100% for array-embedded objects (x-documentsInspected)
- Rename folder: src/utils/json/mongo/ → src/utils/json/data-api/
- Rename enum: MongoBSONTypes → BSONTypes
- Rename file: MongoValueFormatters → ValueFormatters
- Add 9 new tests for array stats and probability
Group B of SchemaAnalyzer refactor:
- B1: SchemaAnalyzer class with addDocument(), getSchema(), reset(), getDocumentCount()
- B2: clone() method using structuredClone for schema branching
- B3: addDocuments() batch convenience method
- B4: static fromDocument()/fromDocuments() factories (replaces getSchemaFromDocument)
- B5: Migrate ClusterSession to use SchemaAnalyzer instance
- B6-B7: Remove old free functions (updateSchemaWithDocument, getSchemaFromDocument)
- Keep getPropertyNamesAtLevel, getSchemaAtPath, buildFullPaths as standalone exports
…x properties type

Group C of SchemaAnalyzer refactor:
- C1: Add typed x-minValue, x-maxValue, x-minLength, x-maxLength, x-minDate,
  x-maxDate, x-trueCount, x-falseCount, x-minItems, x-maxItems,
  x-minProperties, x-maxProperties to JSONSchema interface
- C2: Fix properties type: properties?: JSONSchema → properties?: JSONSchemaMap
- C3: Fix downstream type errors in SchemaAnalyzer.test.ts (JSONSchemaRef casts)
…temBsonType

Group D of SchemaAnalyzer refactor:
- D1: Add bsonType to FieldEntry (dominant BSON type from x-bsonType)
- D2: Add bsonTypes[] for polymorphic fields (2+ distinct types)
- D3: Add isOptional flag (x-occurrence < parent x-documentsInspected)
- D4: Add arrayItemBsonType for array fields (dominant element BSON type)
- D5: Sort results: _id first, then alphabetical by path
- D6: Verified generateMongoFindJsonSchema still works (additive changes)
- G4: Add 7 getKnownFields tests covering all new fields
… toFieldCompletionItems)

Group E of SchemaAnalyzer refactor:
- E1: generateDescriptions() — post-processor adding human-readable description
  strings with type info, occurrence percentage, and min/max stats
- E2: toTypeScriptDefinition() — generates TypeScript interface strings from
  JSONSchema for shell addExtraLib() integration
- E3: toFieldCompletionItems() — converts FieldEntry[] to CompletionItemProvider-
  ready FieldCompletionData[] with insert text escaping and $ references

Also:
- Rename isOptional → isSparse in FieldEntry and FieldCompletionData
  (all fields are implicitly optional in MongoDB API / DocumentDB API;
  isSparse is a statistical observation, not a constraint)
- Fix lint errors (inline type specifiers)
- 18 new tests for transformers + updated existing tests
- Add 5 tests for clone(), reset(), fromDocument(), fromDocuments(), addDocuments()
- Mark all checklist items A-G as complete, F1-F2 as deferred
- Add Manual Test Plan section (§14) with 5 end-to-end test scenarios
- Document clone() limitation with BSON Binary types (structuredClone)
- Add monotonic version counter to SchemaAnalyzer (incremented on mutations)
- Cache getKnownFields() with version-based staleness check
- Add ClusterSession.getKnownFields() accessor (delegates to cached analyzer)
- Wire collectionViewRouter to use session.getKnownFields() instead of standalone function
- Add ext.outputChannel.trace for schema accumulation and reset events
Co-authored-by: tnaum-ms <171359267+tnaum-ms@users.noreply.github.com>
Move SchemaAnalyzer, JSONSchema types, BSONTypes, ValueFormatters, and
getKnownFields into packages/schema-analyzer as @vscode-documentdb/schema-analyzer.

- Set up npm workspaces (packages/*) and TS project references
- Update all extension-side imports to use the new package
- Configure Jest multi-project for both extension and package tests
- Remove @vscode/l10n dependency from core (replaced with plain Error)
- Fix strict-mode type issues (localeCompare bug, index signatures)
- Update .gitignore to include root packages/ directory
- Add packages/ to prettier glob
…itions

The bsonToTypeScriptMap emits non-built-in type names (ObjectId, Binary,
Timestamp, etc.) without corresponding import statements or declare stubs.
Currently harmless since the output is for display/hover only, but should
be addressed if the TS definition is ever consumed by a real TS language
service.

Addresses PR #506 review comment from copilot.
…ion names

- Prefix with _ when PascalCase result starts with a digit (e.g. '123abc' → '_123abcDocument')
- Fall back to 'CollectionDocument' when name is empty or only separators
- Filter empty segments from split result
- Add tests for edge cases

Addresses PR #506 review comment from copilot.
Add comment explaining why the cast to JSONSchema is safe: our
SchemaAnalyzer never produces boolean schema refs. Notes that a
typeof guard should be added if the function is ever reused with
externally-sourced schemas.

Addresses PR #506 review comment from copilot.
…lashes

- Replace SPECIAL_CHARS_PATTERN with JS_IDENTIFIER_PATTERN for proper
  identifier validity check (catches dashes, brackets, digits, quotes, etc.)
- Escape embedded double quotes and backslashes when quoting insertText
- Add tests for all edge cases (dashes, brackets, digits, quotes, backslashes)
- Mark future-work item #1 as resolved; item #2 (referenceText/$getField)
  remains open for aggregation completion provider phase

Addresses PR #506 review comment from copilot.
tnaum-ms added 30 commits March 18, 2026 09:20
These debug logs were used during development and logged full editor text and
completion items to the webview console on every keystroke. Removed to avoid
unnecessary noise and potential data exposure.
Documentation links like [DocumentDB Docs](https://...) were not rendered as
clickable hyperlinks in Monaco's completion detail panel. Monaco requires
{ value: string, isTrusted: true } on MarkdownStrings to enable link rendering.

Set isTrusted: true on operator documentation MarkdownStrings in
mapOperatorToCompletionItem. This is safe because the documentation content
comes entirely from documentdb-constants (operator descriptions we control),
not from user-generated content.
Previously, ClusterSession reset the SchemaAnalyzer when the user changed their
query. This meant queries returning 0 results left the autocompletion field list
empty. Now the SchemaAnalyzer accumulates field knowledge monotonically across
queries within the same session — new fields are added, type statistics enriched.

Trade-off: type statistics represent aggregated observations across all queries,
not a single query snapshot. This is acceptable since the UI shows approximate
type info (e.g., 'mostly String') rather than absolute percentages.

Added a future work discussion in docs/plan/future-work.md about potential
strategies for separating cumulative vs. per-query statistics if needed.
$not is a field-level operator (e.g., { price: { $not: { $gt: 1.99 } } }),
not a root-level logical combinator like $and/$or/$nor. It was incorrectly
included in KEY_POSITION_OPERATORS, causing it to appear at query root (where
it's invalid) and be hidden at operator position (where users need it).

Changes:
- Remove '$not' from KEY_POSITION_OPERATORS in completionKnowledge.ts
- Update JSDoc to document why $not is excluded
- Update tests: expect $not at operator position, not at key position
At value position in the project editor, show 1 (include) and 0 (exclude)
instead of filter-specific completions (operators, BSON constructors, etc.).
At value position in the sort editor, show 1 (ascending) and -1 (descending).

These are the most common values for projection and sort fields. Projection
operators like $slice and $elemMatch remain available via operator-position
completions for advanced use cases.
When the user clears the editor content (removing the initial '{ }'), field
completions now insert '{ fieldName: $1 }' instead of 'fieldName: $1' to
produce valid query syntax. Operator snippets already include their own braces
and are not double-wrapped.

A 'needsWrapping' flag is computed in registerLanguage.ts by checking whether
the editor text contains a '{' character. When true, field completions in the
'all completions' fallback path get wrapped with outer braces.
Added ':', ',', and '[' to the completion provider's triggerCharacters list.
These positions are already handled by the cursor context parser (value after
':', new key after ',', array element after '[') but previously required
manual Ctrl+Space invocation.

Added string-literal detection (isCursorInsideString) to suppress completions
when trigger characters appear inside string values. Uses a forward scan
counting unescaped quotes to determine if the cursor is inside a string.
When inside a string, returns empty suggestions to prevent the popup.
Extended the HoverProvider to show type information when hovering over field
names in the query editor. When a field is recognized from the completion
store (populated by SchemaAnalyzer), the hover shows:
- Field name (bold)
- BSON type (e.g., Number, String, Date)
- Sparse indicator when the field is not present in all documents

Operators/BSON constructors take priority over field names to avoid
ambiguity. Statistics use relative language ('sparse') rather than
absolute numbers since the SchemaAnalyzer accumulates data across queries.

Also set isTrusted: true on operator hover content to make doc links
clickable (consistent with the completion documentation fix).
…ypes

Three hover provider fixes:

1. Quoted string keys: Hover now works for quoted field names like
   {"address.street": 1}. Monaco's getWordAtPosition treats quotes/dots
   as word boundaries, so a new extractQuotedKey helper manually extracts
   the full quoted key from the line content.

2. isTrusted on field hovers: Field hover content now has isTrusted: true,
   making any future links in field hovers clickable. (Operator hovers
   already had this from a previous commit.)

3. Redesigned field hover format:
   - Field name bold, with 'sparse: not present in all documents' in
     subscript on the same line (no em-dashes)
   - 'Inferred Types' bold section header
   - Comma-separated type list (using displayTypes from all observed
     BSON types for polymorphic fields)

Also threaded bsonTypes/displayTypes through FieldCompletionData from
FieldEntry for polymorphic field support.
Added tests verifying that operator categories appear at the correct
completion positions:
- Key position: only logical combinators ($and, $or, $nor) and meta
  operators ($comment, $expr, etc.) — no field-level operators
- Value position: all field-level categories — comparison ($gt, $eq),
  evaluation ($regex), element ($exists, $type), array ($all,
  $elemMatch, $size), and field-level $not
- Operator position: same as value position

This confirms $all is correctly excluded from key position (it's a
field-level array operator: { tags: { $all: [...] } }), not a root-level
query combinator.
When a field completion inserts 'rating: ', the completion popup did not
reappear for the value position. Now, typing a space after ':' or ','
triggers the suggestion popup after 50ms. This provides a smooth
autocomplete flow: select field → space → see value suggestions.

Implemented via onDidChangeModelContent listener that detects single-space
insertions preceded by ':' or ',' and programmatically calls
editor.action.triggerSuggest. Wired into all three editors (filter,
project, sort) with proper cleanup.
Monaco renders hover markdown links as <a> tags, but the webview CSP
blocks direct navigation to external URLs. Added a delegated click handler
on the query editor container that intercepts <a> clicks with http/https
hrefs and routes them through the existing trpcClient.common.openUrl
mutation, which calls vscode.env.openExternal on the extension host side.
EMPTY editor (no braces) now shows key-position completions (fields +
root operators) with { } wrapping, instead of showing all operators.
UNKNOWN context remains as the full discovery fallback.

Changes:
- createCompletionItems: route needsWrapping+unknown to new
  createEmptyEditorCompletions (key-position items with wrapping)
- createAllCompletions: now pure UNKNOWN fallback (no needsWrapping param)
- New tdd/ folder with behavior spec (readme.completionBehavior.md) and
  26 category-based TDD tests verifying the completion matrix
- Updated existing category tests: KEY position allows 'evaluation'
  (because $expr/$text are key-position operators), UNKNOWN now shows
  everything
- Updated completions/README.md: added Empty position, fixed flow docs

The TDD tests check categories (from description label) and sortText
prefixes, not specific operator names, for resilience to
documentdb-constants changes.
…pletions

Added `standalone?: boolean` to `OperatorEntry`. When `false`, the operator
is excluded from completion lists but remains in the registry for hover docs.

Operators marked as standalone: false:
- Geospatial sub-operators: $box, $center, $centerSphere, $geometry,
  $maxDistance, $minDistance, $polygon (only valid inside $geoWithin/$near)
- Positional projection: $ (not a standalone filter/sort operator)
- Sort modifier: $natural (not valid as a filter value)

Changes:
- packages/documentdb-constants/src/types.ts: added `standalone` field
- packages/documentdb-constants/scripts/generate-from-reference.ts: parse
  `- **Standalone:** false` from overrides, emit in generated code.
  Also fixed bitwise BSON type from 'int' to 'int32' to match SchemaAnalyzer.
- packages/documentdb-constants/resources/overrides/operator-overrides.md:
  added standalone: false overrides for 9 operators
- src/webviews/documentdbQuery/completions/createCompletionItems.ts: filter
  `e.standalone !== false` in all three completion builders (all/value/operator)
…ab stops

ESC key handling (MonacoEditor.tsx):
- Added context precondition '!suggestWidgetVisible && !inSnippetMode'
  to the Escape command so Monaco's built-in handlers dismiss the
  suggest widget or exit snippet mode before our handler fires.

Tab key handling (MonacoAutoHeight.tsx):
- Replaced onKeyDown Tab interception with addAction using
  precondition '!inSnippetMode'. During snippet tab-stop navigation,
  Monaco's built-in Tab handler takes over. After the snippet session
  ends (final tab stop or ESC), Tab reverts to moving focus out.
Field names originate from user database schema and should not be
rendered as trusted markdown. This change:
- Removes isTrusted from field hovers (keeps supportHtml for formatting)
- Escapes markdown metacharacters in field names and type strings
- Adds escapeMarkdown utility in src/webviews/utils/ for reuse
- Updates tests accordingly
Moves extractQuotedKey and tryMatchAsClosingQuote from registerLanguage.ts
into a dedicated extractQuotedKey.ts module. This decouples the pure string
helper from the Monaco registration wiring, making tests less brittle and
enabling easier reuse.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Scrapbook Experience (shell integration) 🚀

2 participants